-
Notifications
You must be signed in to change notification settings - Fork 49
Remove len() calls from compress paths #520
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…code paths. Minor perf. improvement (very) if compression is used - or not: do not call len() for the buffer redundantly. Signed-off-by: Yaniv Kaul <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR optimizes compression paths by eliminating redundant len() calls on data that will be compressed. The changes store the length of data in variables before compression and pass it as a parameter to compression functions, avoiding multiple length calculations on the same data.
- Modified compression function signatures to accept a length parameter
- Updated compression logic to pass pre-calculated lengths instead of recalculating after compression
- Fixed a typo in a docstring ("one of more" → "one or more")
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| cassandra/connection.py | Updated LZ4 and Snappy compression functions to accept length parameter |
| cassandra/protocol.py | Modified message encoding to pre-calculate body length and pass to compressor |
| cassandra/segment.py | Updated segment compression to use pre-calculated lengths and fixed compression logic |
| tests/unit/test_segment.py | Updated test cases to use new compression function signature |
In essence, we know the length of the uncompressed payload, so pass it along. To do so, changed the signature of the function (and did the same as dummy for snappy). Signed-off-by: Yaniv Kaul <[email protected]>
0244246 to
078a131
Compare
|
To my surprise, very similar code exists in GoCQL.... |
|
CI failure looks unrelated (especially as no compression was involved, but who knows): =================================== FAILURES ===================================
______ LightweightTransactionTests.test_no_connection_refused_on_timeout _______
self = <tests.integration.standard.test_query.LightweightTransactionTests testMethod=test_no_connection_refused_on_timeout>
def test_no_connection_refused_on_timeout(self):
"""
Test for PYTHON-91 "Connection closed after LWT timeout"
Verifies that connection to the cluster is not shut down when timeout occurs.
Number of iterations can be specified with LWT_ITERATIONS environment variable.
Default value is 1000
"""
insert_statement = self.session.prepare("INSERT INTO test3rf.lwt (k, v) VALUES (0, 0) IF NOT EXISTS")
delete_statement = self.session.prepare("DELETE FROM test3rf.lwt WHERE k = 0 IF EXISTS")
iterations = int(os.getenv("LWT_ITERATIONS", 1000))
# Prepare series of parallel statements
statements_and_params = []
for i in range(iterations):
statements_and_params.append((insert_statement, ()))
statements_and_params.append((delete_statement, ()))
received_timeout = False
results = execute_concurrent(self.session, statements_and_params, raise_on_first_error=False)
for (success, result) in results:
if success:
continue
else:
# In this case result is an exception
exception_type = type(result).__name__
if exception_type == "NoHostAvailable":
pytest.fail("PYTHON-91: Disconnected from Cassandra: %s" % result.message)
if exception_type in ["WriteTimeout", "WriteFailure", "ReadTimeout", "ReadFailure", "ErrorMessageSub"]:
if type(result).__name__ in ["WriteTimeout", "WriteFailure"]:
received_timeout = True
continue
pytest.fail("Unexpected exception %s: %s" % (exception_type, result.message))
# Make sure test passed
> assert received_timeout
E assert False
tests/integration/standard/test_query.py:957: AssertionError |
Very similar changes to the Python driver (scylladb/python-driver#520 ), there are redunandant calls to len(). Removed them. Follow-up patch will hopefully change the Encode() functions to accept the length of the buffer, to remove another len() call (just as done in the Python change mentioned above). Signed-off-by: Yaniv Kaul <[email protected]>
Very similar changes to the Python driver (scylladb/python-driver#520 ), there are redundant calls to len(). Removed them. Follow-up patch will hopefully change the Encode() functions to accept the length of the buffer, to remove another len() call (just as done in the Python change mentioned above). Signed-off-by: Yaniv Kaul <[email protected]>
|
I don't think it makes sense to complicate API by saving couple of CPU ticks, if anything this optimization should be done by python or cython. |
I'm not sure what API I'm changing here (it's never clear to me what is considered an API in our Python driver - documentation is poor). I think a couple of CPU ticks + improved readability (IMHO) is a good direction. Nevertheless, if you don't agree with the approach, OK. Closing. |
|
We could bypass the "API" breakage, by setting a default and/or handling for missing values for each new parameter that was introduced to the these 2 functions. |
The problem is not in the breaking API, but rather in complicating it without good reason. Also this API ought to be changed to be streaming. |
I don't fully understand what API you refer to. I don't think I've touched any public API. |
Pre-review checklist
./docs/source/.Fixes:annotations to PR description.